Sheffield University: CLEF 2000 Submission (Bilingual Track - German to English)

نویسندگان

  • Tim Gollins
  • Mark Sanderson
چکیده

We investigated dictionary based cross language information retrieval using lexical triangulation. Lexical triangulation combines the results of different transitive translations. Transitive translation uses a pivot language to translate between two languages when no direct translation resource is available. We took German queries and translated then via Spanish, or Dutch into English. We compared the results of retrieval experiments using these queries, with other versions created by combining the transitive translations or created by direct translation. Direct dictionary translation of a query introduces considerable ambiguity that damages retrieval, an average precision 79% below monolingual in this research. Transitive translation introduces more ambiguity, giving results worse than 88% below direct translation. We have shown that lexical triangulation between two transitive translations can eliminate much of the additional ambiguity introduced by transitive translation. Introduction and Background Cross Language Information Retrieval (CLIR) addresses the situation where the query that a user presents to an IR system, is not in the same language as the corpus of documents they wish to search. This situation presents a number of challenges (Grefenstette (1998)) but primary amongst these is the problem of crossing the language barrier (Schauble & Sheridan (1997)). Almost all the approaches to this problem require access to some form of rich translation resource to map terms in the query language (the source) to terms in the corpus (the target). “Transitive” CLIR aims to address the situation where there are limited direct translation resources available (Ballesteros (2000)). A transitive CLIR system translates the source language terms by first translating the terms into an intermediate or "pivot" language and then translating the resulting terms into the target language. Thus, a transitive system could translate a query from German to English via either Dutch, or Spanish. The main aim of this work is to combine translations from two different transitive routes to discover if this can reduce the ambiguity introduced by transitive translation. Ballesteros suggested the possibility of using this approach in the summary to her recent chapter (Ballesteros (2000)). We have chosen to call this approach “lexical triangulation”, see figure 1. We have chosen to simulate a Machine-Readable Dictionary (MRD) approach to CLIR. This follows on from the work of Ballesteros & Croft (1996, 1997, 1998), and Ballesteros (2000). The Experimental Environment The underlying IR system used in the Sheffield submission was the GLASS system (Sanderson (2000)). The translation resources were derived from the German, Spanish, Dutch, and English components of EuroWordNet (Vossen (1999)). The data used to lemmatise the German queries was derived from the CELEX German databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DFKI's LT-lab at the CLEF 2005 Multiple Language Question Answering Track

This report describes the work done by the QA group of the Language Technology Lab at DFKI for the 2005 edition of the Cross-Language Evaluation Forum (CLEF). We describe the extensions made to our 2004 QA@CLEF German/English QA–system, especially the question–type driven selection of answer strategies. Furthermore, details concerning the processing of definition and temporal questions are desc...

متن کامل

University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task

This paper describes the work done at the University of Hagen for our participation at the German Indexing and Retrieval Test (GIRT) task of the CLEF 2004 evaluation campaign. We conducted both monolingual and bilingual information retrieval experiments. For monolingual experiments with the German document collection, the focus is on applying and comparing three indexing methods targeting full ...

متن کامل

Bilingual Information Retrieval with DesIRe and Internet Translation Services

DesIRe is the Dortmund extensible structured Information Retrieval engine. Its extensibility is based on the implementation of physical data independence; it's query interface consists of datatypes with respective search predicates. This concept enabled us to add bilingual search predicates for the datatypes Text::English and Text::German (for English and German text, respectively). Our impleme...

متن کامل

First Participation of University and Hospitals of Geneva to Domain-Specific Track in CLEF 2008

We participate in 2008 to our first Domain-Specific Track, with the aim to establish a baseline for our Information Retrieval engine in an unknown domain for us. We are specialized in Natural Language Processing in the biomedical domain, and we participate to the medical Image track and to TREC Genomics for four years with textual strategies, as queries expansions with controlled vocabularies, ...

متن کامل

The University of Amsterdam at CLEF 2002

This paper describes the official runs of our team for CLEF 2002. We took part in the monolingual tasks for each of the seven non-English languages for which CLEF provides document collections (Dutch, Finnish, French, German, Italian, Spanish, and Swedish). We also conducted our first experiments for the bilingual task (English to Dutch, and English to German), and took part in the GIRT and Ama...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000